









. i 



i ' % ; 

t * " .XiV? 

: 



REPORT 



RESUMES 



ED 010 609 24 

THE PROBLEM OF CLASSIFYING MEMBERS OF A SINGLE POPULATION 
INTO GROUPS. STATISTICAL MODELS FOR THE EVALUATION AND 
INTERPRETATION OF EDUCATIONAL CRITERIA, PART II. 

BY' SAW, ;.G. FLORA, R.E. 

VIRGINIA POLYTECHNIC INST., BLACKSBURG 

REPORT NUMBER CRP-1 132-PT-2 PUB DATE 

REPORT NUMBER BR*S-0224 

EORS PRICE MF'SO.ia NC-S3.36 84P. 



6S 



DESCRIPTORS- ^MATHEMATICAL MODELS, ^EDUCATIONAL RESEARCH, 
MATHEMATICAL APPLICATIONS, CALCULATION, MEASUREMENT 
TECHNIOUES, ♦PREDICTIVE ABILITY (TESTING) , 4CPREDICTIVE 
MEASUREMENT, PREDICTIVE VALIDITY, ♦STATISTICAL ANALYSIS, 
BLACKSBURG, VIRGINIA 



AN alternative MATHEMATICAL APPROACH TO THE PROBLEM OF 
classification of an individual By vector SCORES WAS PROPOSED 
FOR THOSE CASES WHEN THE USUAL CLASSICAL MODEL DOES NOT HOLD. 
THE NEW APPROACH ASSUMED THAT, WITHIN A GROUP HOMOGENEOUS IN 
REGARD TO BACKGROUND, TWO DISTINCT POPULATIONS OF PASS AND 
FAIL DO NOT EXIST, BUT RATHER A CONTINUOUS SPECTRUM. MAKING 
USE CF THIS NEW MODEL, THE INVESTIGATORS WERE ABLE TO PROPOSE 
A METHOD OF EVALUATING THE VECTOR SCORES ON MATHEMATICS 
DEPARTMENT ADMISSIONS TESTS IN RESPECT TO THEIR VALUE AS 
INDICATORS OF THE ULTIMATE WORTH OR PERFORMANCE OF THE 
INDIVIDUAL. THE EFFICIENCY OF THE NEW METHOD WAS FOUND TO BE 
QUITE HIGH, related REPORTS ARE ED 003 044, ED 003 045, AND 
ED 003 046. (GO) 



UiERIC 






















V 









fc 





STATISTICAL MODELS FOR THE EVALUATION AND 
INTERPRETATION OF EDUCATIONAL CRITERIA 



kt i A 



' \ 



Cooperative Research Project Number 11S2 



V 



bjr 

J. 6. Saw and R. E. B'lora 



mi 



Virginia Polytecbmc Institute 
Blacksburg, Virginia 
1966 






PART II. THi PROBLEM OF CLASSIFYING MEMBERS 
OF A SINGLE POPULATION INTS GROUPS. 



The j)reseritation of these notes was supported through the Co- 

V 

operative Researelu Program of the Office of Education, U. S. De- 
partment of Her (th, Education, and Welfare,. ' 












I • 



\ ♦ 



III! JJ. III 1 I JU jl l L ir «q » > llllTMrOflU i n ' 



o 

FRIC 






-n... ^ 








PRPrpniMft EA&& MiSSiNQI^ 

\h 6 

If. S. “^*«>«ENT OF EDUCATION AND WELFARE ’ 

Tkf* A ’ ' Wticatiori 



rs' r 



,.voc5 iromth^ 



y.g.OEPAWWE'*'^ "’", '■ : 

TO 5 document ■ ■’;• ■ 



not oeca; 

»\ateo d& 
oo*Won or fipVJ«ii» 



ACKNOWLEDGEMENTS 

The authors would like to thank the U. S. Department 
Health Education and Welfare for the opportunity to 
investigate this particular problem and for their financial 

c> 

support. Thanks are extended to the faculty of the 
Statistics Department, Virginia Polytechnic Institute, 
for their interest and to Mrs. Ann Harrison for her 




INDEX 



^1- ^ 

wictpx;t!x: 


1. DATA PRESENTED BY GROUPS. 


DISCRIMINATION 




1.1 


Introduction. 




10 


• 

ro 


The Classical Approach. 




13 


1.3 An Alternative Approach. 

Chapter 2. NULL DISTRIBUTION 




16 


2.1 


The Null Distribution of the 
Latent Roots, 


Largest 




• 

CM 


Case c=2. 




29 


2.3 


Case c>2. 




30 



Chapter 3. THE ALTERNATIVE DISTRIBUTION. c=2 



3.1 


Preliminaries. 


31 


3.2 


An Underlying Model. 


32 


3.3 


Development. 


33 


3.4 


The Case p=l? Moments of a Correlation 
Coefficient. 


35 


3.5 


Moments of n^. Case kj^=n-l. 


42 


3.6 


2 

Moments of n ; General Two Group 
Situation . 


44 


Chapter 


4. THE POWER OF THE TEST OF R^=0. 

0 




4.1 


Introduction. 


50 


4.2 


Large Sample Behaviour. 


50 


4.3 


Small Sample Behaviour. 


54 


4.4 


Numerical Values of Power in a Specific 





Case. 



56 





9 



SUMMARY 






59 


APPENDIX 


A 


2 

Moments of 1-R 


61 


APPENDIX 


B 


Values of H^(Pr, a, b) 


73 









s_x-Jb- -i ..«..l»~ULL’iw— »i>4«iS >xci^v... ^ ^ . jn ..... 



V -^ ' 







'' '■ ' r: :cl^yr-W'^:i: 



10 



CHAPTER 1 




DATA PRESENTED ’ BY' GROUPS . DISCRIMINATION 



1 . 1 Introduction 

Frequently data is presented or collected by groups in an attempt 
to comment on possible differexices between the groups. For example, 
the performance in a battery of : tests'* may be tabulated for male and 
female; married and single;: in-state or out-of<^state students and a 
further-classiflcationmiay be according; as the student was successful 
(obtained a B.S. for instance} or was unsuccessful (in this respect). 



Whilst it is likely true that observations made within the in-state, 
male, married, category; or the in-state, female, single category are 
apprbximately normally, distributed, it is unlikely indeed that tests 
relating to mental performance are normally distributed within the 
classi of people who obtain, a bachelors degree. This fact is recog- 
nized quite easily by university teachers. Freshmer)^ performance in, 
say, mathematics is distributed quite close to normality; due to drop- 
outs ''over the following years, some of the* weaker students are not 
present in the senior year resulting in a discernable skewness in the 
grades, the better students being present in larger numbers. The 
grades of students accepted for graduate work are extremely skew, and 
in advanced graduate courses a t^inal grade- of A usually dominates all 
other grcuies; in fact, it. is' rare indeed to find a conscientious student 
in an advanced graduate^ course who does not achieve a grade of B or 
better. It would certainly be unwise therefore to work with data 







relating --to mental perfoxmarice£;:<^ 1 f. graduate students under the assumption 
that these dateuwere normally, distributed. 



It seems reasonable to assume that, data collected on a student entering 
college would have a multivariate normal distribution (in the United 
States but possibly, notriny for example, many European countries where 
students have, to meet qualifications in order to be permitted to 

embark on^ a higher :educationi.. Unfortunately", data pertaining to the 
weaker.students.ia .unlikely«.io.i>e: available in say the senior year due 
to the . possibility of being Ja drops-out; However, a student may often 
be. categorized.. im .a: broadr way; for example it is possible to adopt 
the system of categorization 

(i) student successful enough to proceed to a higher degree 

(ii) student successful at the bachelors level but not likely 




to succeed in graduate school 

(iii) students weak in their work at the bachelors level but who 
were not required to repeat or dropout 

(iv) students who dropped out. 

For every student then we have a vector of correlated normal variates 
X and an "index" d such that d takes on certain values according as 
the student belongs to group (i) , (ii) , (iii) , or (iv) above. This 
should be contrasted with Part IV of this contract wherein each 
broad category contained only one student or vector observation and 
d took on the set of integers l,2,3,...,n where n was the nmiber of 
students sampled. 

The classical approach to this problem has been to assume that x|d 
has a multivariate normal density, that is observations within a 




o 




12 



broad category are normally distributed. This has been discussed 
in the previous paragraphs of this chapter and ;ve reject the 
premise as unlikely. 

Before presenting an analysis of the problem we define our data 
and notation. Suppose a sample of n individuals are available. 

Their performance on a battery of p tests is noted and is represented 
by the p elements of vector x Cx * = (x^ ^ ^ 2 ' * * * ' ^p^ ^ addition 

each student is categorized as described earlier. We may conveniently 
assign a characteristic random variable d to this student where d may 
take any of c different (scalar) value d^^, d 2 # etc. Our problem 

is to find: - 

(a) which elements of x have the same distribution in each broad 
category 

(b) which elements of x differ strikingly in different broad 
categories 

(c) what linear combination of the elements of x are most indicative 
of the category to which the student is assigned. 

We will list the vector observations for the n students as 



k^ students are assigned to category i and k^+k 2 +. . .k^=n. Sometimes 
the set {k^, k 2 # •••, k^} are themselves random variables; in some 
circumstances they may be fixed, for example in the case where we 
are concerned with the best three students we may set c==2 k^=n-3? ^2=3. 




- 11 ' - 12 ' 



'^Ikj^' -21' -22' ••*'-2k2 



. . . , 




c 










,?-’ I 








1.2 The Classical Approach 



lii 'cut: ^lass^lcdl ^upxOoCll X't is aa5uiu6ia chat. 



(1.2.1) x^j ^ Np(v^;V) 



3~1 y 2 / • • • / ^ i“l f2^o..yC 



that is X. . has a multivariate normal density with mean (possibly 
different for each category) but with a dispersion matrix which is 
the semie (V) for all categories. We have envisaged situations wherein 
the assumptions of normality are invalid; the assumption of equal 
dispersion matrix is of course seldom ever valid. 

It is required to find a system of multipliers, such that the 
scalar quantity 



(1.2.2) Z. . = ^'x. . 

^ J — i J 



takes on distinctive values as i varies. Now if (1.2.1) holds then 



for fixed 8 



(1.2.3) iV£) 



j=lr2,.. . 
x~2> ^2,..o,c. 



so that the problem is that of selecting a 8^ such that iB*£^ differ 
as widely as possible. Of course the }Jl? are unknown (otherwise 
there would be no problem) and the common dispersion matrix V is assumed 
unknown too. 
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then in a likelihood test of the hypothesis: 
would construct for any given, fixed 3 



(1.2.5) 


F(6)= 

e/n-c 



where 



(1.2.6) 


^ — 2 
h= 1 (Z.-Z..) k. 

i=l ^ ^ 


(1.2.7) 


e=EZ(Z. ,-Z. ,)^ 
ij ^ 



Clearly F(£) will take on small or large values according to the 
value of 3 since if c<p we could find (if the were 

known) a vector £ such that for all i“l,2,..., c. 

«L «L 

Alternatively, it would be possible to select a £ such that 

Classically one chooses the vector £ which maximized the 
value of the ratio h/e. 



Now 




(1.2.8) 


° 2 _ _ _ - 
h= 1 (Z.-Z.,) k. = 3' {2 (X..-X. .) (x.-x. .) *k. 


(1.2.9) 


=£’HjS (say) , 

e=EE(Z. .-"Z. .) ^=3' {ZS (x. .-X. .) (x. .-X.) MB 
ID 1 - ij “ID “1 “13 “i* “ 

=B'E3 (say) . 



We therefore seek that 3 which maximizes 






where 

(1.2.11) H=s(Xj^.-x. .) (x^.-x. .) 'kj^ 

e=i:e (x. .-X. . ) (x . .-X. .) * 
ij -i “ID “1 

so that H and E are symmetric pxp matrices with K singular 
whenever c<p. We will hold n>p>c so that E will be non- 
singular and H singular. 

It follows quickly that the required value of 
is given by 

(1.2.12) (H-9E)B^ =0 

where 6 is the largest root of 
H-eE|=0 



(1.2.13) 

To emphasise we repeat the assumption made to get to this 
result. 

Assumption I; 

All vectors x\^j have a multivariate distribution. 

Assumption II; 

Each vector x^^ has the same dispersion matrix. 

and also we repeat that if x^^ represent measures of intelligence 
and the categories are defined by differences in intelligence 
then neither assumption is likely to be valid. 



o 
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1.3 An alternative approach ' 



We now make less restrictive assiimptions to meet the case 
where the mode of categorization is heavily related to the 
:• observed variates as described in the last paragraph of 
section 1.2. 

We assume that if an individual is randomly dravm from 
some population then prior to categorization, x, the vector . 
of observations has a multivariate density. We shall not 
for the momenl; assume a specific type of multivariate density, 
only that a vector of means and a dispersion matrix V 
exists for this density. The individual is how ‘categorized 
and a value, d, characteristic of the category is assigned. 
Our n observations can now be represented as 

{d^ 7 X j } "^—1 f 2 f » m » f k^ f 1 — 1 , 2 ,. . . ,c. 

We now seek a vector £ and a set of values d^^, d 2 f...d^ such 

that the correlation between the scalar quantitites d^ and 

^*x^j has as large a value as possible when taken over the 

^ i— c ^ 

n individuals. Let required system then 



(1.3.1) 



ZEd. 6'x. . 
. . 1 ij 

13 



ZSd. 6*x. . 

> ij 



Vs* ZZ (x. .-X. .) (X. .>X..)'B Vs' zz (x, ,-x. .) (x. .-X.. ) ' S 
V — — n — "“I3 — "" ij "”ij ”” 



J-D 



whenever ({d^} ,^)^( {d^} ,3) 



It is convenient to "standardize" the {d. } by requiring 
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(1 



.3.2) j 

\ 

I 



Zd.k.=0. 
i ^ ^ 



2.a. jr. .=JL. 

i ^ ^ 



this leads to simplifications in the expressions for the 
correlations and also in the further development. 

The numerator of the right-hand side of (1.3.1) can be 
written 



^ Zd.x. k. = Zd.k. (x. -X..) 

i — -1 *1 . 1 1 —1 . — 

by virtue of (1.3.2). 



Writing . ) (x^j-x. .) * 

3 



the denominator is then 






Our objective is to maximize corr ({dj^},£) where 



(1.3.3) corr ({d. } ,£) = 3 * Zd.k. (x.-x..) 



ypri 

subject to Ed.k,=0; Ed2k.=l over choice of {d. and 6. 
i -^ ii^ 11=1 

Since the scale of £ is immaterial we choose to maximize 
corr({d^},^) subject to ^*T£=1. 

Case c=2 

2 2 2 

In the case c=2, the conditions Z d.k.=0 and Z d.k.=l are 



i=l 



1 1 



i=l 



1 1 



sufficient to determine d^ and d2 since we have 
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(1.3.4) 



d,kT+d^k^=0 

u. X ^ ^ 

I <3^kl+d2k2=l 



k’^+k2=n. 



it follows ti^t 

r dj= -/kyk^ 



(1.3.5) 



\ ^2= /k^/k^n. 



In this event 



(1.3.6) {corr (d ,d^ ,3) }^=3' (2d?k? (x. -x. . ) (x . .-x. . ) * ) 3 



3*T3 



k,k 



12 3* E (x.-x. . ) (x. -X. .) *3 

_ - -1 - 



3*T3 



^1^2 

n 



£' (Xi.“X2*^ *1 



3*T3 



and thus 3 is the solution of 



(1.3.8) Ai*^2 (Xj^.-X2.) (Xj^.-X 2.) '-X^i=0 



where X is the largest root of 



(1.3.9) 



k^k 
n 



1 2 (x^.~X2.) (x2*^-X2^* I 
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Now since (X 2 .-X 2 . ) .-X 2 . ) * is of rank 1 there is only one 

non-zero X which is a solution of (1.3.9). 

It is noted that 

(1.3.10) 

n i ^ 

=H 



where H corresponds to the matrix H defined in the classical 
situation (equation (1.2.8)). Also 



(1.3.11) T=E+H 



where E is defined in 1.2.9. Hence ^ is the largest root of 

(1.3.12) |h-X(E+H)|=0 

A 

and j8 the associated eigen vector. 

Evidently with c=2 and 0 defined by (1.2.13) then 

(1.3.13) X=~^ — . 

1+0 



Now if 



(1.3.14) 




then 

(1.3.15) O=^(1+0)H-0(E+H)^£ 

=>{H-0E) B 



Since X is an increasing function of 0 we see that the vector 
^ is the same in our new approach as for the classical approach 
(c=2) , a somewhat startling result. 




Case c>2. 



In the case of more than two categories we shall maximize 
(corr defined by (1.3.3) over choice of {d^} 

satisfying (1.3.2) holding the scale of £ so that 3*T£=1. 

Let ^ Le undetermined multipliers and construct 

2 

Q=(^'Id^kj^ (li-E* U*T6-1) 

Using the theory of Lagrange undertermined multipliers the 
required solution is given by 



= 0 . 

3 6 3(J>j^ 3(j^ 3X 



3d. 

1 



A A 



The maximal value is achieved as (d^},£ so that 

( 1 • 3 • 16 ) (3 * Zd.lc. (x. • “X . • )^ k . (x . “X • . ) ' 6~<J> •. k . +(J> ».d . k . 

X 1 —1 — / 1 — X — — 1 1 X X 1 

X””l f2^...c. 

(1.3.17) ^2d^k^(x^.-x. .) J:dA^(x^.-x. .) '-X^£=0 



(1.3.18) 



Xd.k.=0 

X 1 
'' 2 

2d.^k.=l 



I A A 

6'T|l=l 



Summing (1.2.17) over i we have <J>j^=0 whence 



21 



Using (1.3.18) it is easily estaiblished that 

(1.3.20) d.= (x.-x. .)*3 

where, as usual, 

(1.3.21) s;(x. .-X. .) (ic, .-x. .) *k. 

i "1 “ -i - 1 

A 

Premultiplying (1.2.18) by we find 

(1.3.22) A= max (corr ( {d^} ,£) ) ^ 

But 

(1. 3. J3) Sd.k . (X. .-X. . ) Ed.k. (x. ,-x. .) * 

therefore A is the largest solution of 

(1.3.2 4) H^*H£ 

= ATS 5 

/V ^ 

i*H£ 

that is^ the largest solution of 

( 1 . 3 . 25 ) (H-AT)S=0 » 

Define E by E=T-H so that, as in the classical case, 

(1.3.26) E=ZE (x . .-X . . ) (x . .-X. . ) * 

ij “ID -1 “ID 



/ 




then 



A is the largest root of | H- x(E+H)j =0 and ^ the sssocistsd 0i^0n 
vector. Alternatively, with x=0/a+e|# e is the largest root of 
|H-0 E|s=O and ^ satisfies « ^H—9E)_g=0^, Either way, d. is proportional 
to •) 

It is seen then that for any c, the discriminant is numerically 
the same for our system as in the classical case despite the fact 
that' our ba^ic assumptions are widely different, 

1.4 An example with three broad categories. 



A group of eighteen math majors are critically examined by 
their "acuity with the purpose of evaluating the departmental 
admissions examination. The faculty agreed to divide the 
eighteen into three groups of six according to the performance 
of the students in their junior and senior years. After this 
has been done, the respective performances in the four-part 
admissions exam were collated with the students. The results 
are given below. 



Top group: 



Middle group: 



88 

87 
92 

88 

12 



82 




84” 




70 


93 




94 




94 


98 




10 


) 


82 


91 




81 




82 



— j 

100 



86 

88 



94 

79 



88" 




90“ 


90 




88 


89 




93 


71 




88 



97 

68 



84 

76 





Low group: 



80~ 




70“ 




89~ 




80“ 




69 




9f 


71 

/ A 




77 

9 i 




i i 




£ A 
V ^ 




/ U 




/D 


71 




69 




69 




69 




68 




71 


80 




83 




85 


1 


8CL 




81 




66, 



The vector is 



score in part I of admissions exam 

•I II II T T II li II 



II 

II II "III « 

» II II jy II 



II 

II 



It is hhe admissions examination which the faculty reguire to 
evaluate with a view to using it to reject student^ unlikely to 
derive great benefit from a college education. In broad terms 
one may ask: "is the admission exam any indication of performance 
in college; if so how should the results or scores on that 
examination be used to select the more promising students." 

Although we have not yet developed the distribution theory 
relevant to this problem we have available a technique indicated: 
the previous section. It is believed ^rhat? taks^ over the 
population of students^ each admission score approximately 
normally distributed and is not a mixture of populations as 
is required by the theory of discriminant functions. 

Using the notation of the previous section we have. 
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102 


J =i 
-2’ 6 


533 


X =1 

-3* 6 


479 


546 




475 




446 


554 




501 




417 


^04 




48£ 




J58^ 




X. . 


1.514 






18 


1467 








1472 








1446 





= £(x. .• 
i 


“X. . ) (x^ 


.-X. ) 'k 

i 








T406 


1665 


6044 


1842~ 


1 


1665 


15882 


19899 


6774 


18 


6044 


19899 


28634 


9546 


EZ(x. 

ij 


1842 

.“X. .) (x 

J 


6774 

i.-x..)' 


9546 


3192 


, 1 


[16420 


-8064 


16244 


-336~“ 


_1 1 

1 o 1 


-8064 


27117 


14742 


1188 


12 


16244 


14742 


39440 


15378 




|-336 


1188 


15378 


21780 


T-H = 


sr (x. .■ 

ij 


-X. ) (x. 

-!• -13 


•Hi.)' 




1 


IT2014 


-9729 


10200 


-2178~ 




-9729 


11235 


-5157 


-5586 


12 


1 10200 


-5157 


10806 


+5832 


1 


-2178 

L 


-5586 


5832 


18588 



ERIC 




(see equation (1.3.12)). 

Since E+H=T is positive definite we can find a non—singrular 
matrix’ M such ■that MTM^=I. This matrix is of course obtained 
by performing a Doolittle on T. With F the Doolittle lower 



diagonal matrix with {f =1} then 

ai 



FTF ' =D 



With d. .=0 
13 



d^i>0. 



J 



f— 



Whence M=D ^F. We now require the largest latent root of 
MHM* which can be obtained by the itterative methods. It is 
noted that in the case of two groups, H, and therefore MHM*, 
is of rank one whereupon largest latent root of MHM* = trace MHM* 
In our case, the largest latent root is 



A = 0.8802 



and the associated vector is 



0.1165 

10.5208 

11.1623 

1.0000 



or any multiple of it. 

The are now found from equation 1.2.20 to be (after 

standardization) , 




(0.2943; 0.0076^-0.28477) . 






i. 










r' ^ 3^ 




o 
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It is of interest 


to obtain 


x'^ for 


each of 


the 18 


students :- 


Top group: 2040 


2173 


2198 


1994 


2021 


2062 


4 Middle group: 1882 


1887 


1969 


1867 


1824 


1706 


Low group: 1628 


1671 


1655 


1585 


1650 


1669 



The 3 used was 



■ 0.1165 
10.5208 
11.1625 
1.0000 



The resulting values x' 3. divide themselves very nicely as we 
would expect since- there is to be quite a large (0.8802) 



correlation between the x*3_ and the {d.}. 

3 * 



Given any other ' student (not a member of the set of eighteen) 
and his admissions exam score X we could compute X'3=Z to 
decide which category this new individual is most likely to 
belong to. 

Clearly the admissions exam ^ of value though only the 
part II and part III scores appear to have a great deal of 

correlation with the final group to which a student is judged 
to belong. 

Some common sense has to be applied to provide group 
boundaries for the Top, Middle and Low groups. Looking at 
the 18 values of x'j3, a suitable system might be. 

judge as top group if X'jB>2000 
judge as middle group if 1700<X'£<2000 
judge as low group if X* 3<1700. 



The probability of incorrect assignment of course depends 

/V 

on the actual value of X'3 (not true in the classical case). 
Obviously if for some' student X*^=2400 then we would feel pretty 
safe in- judging him to be in the top group. 

Comparisons can also be made between new students . Suppose 
and Xg are the* admissions scores of two new students A and 
B then we may judge that A will succeed better than B if 

XaI > ^ 

again, the confidence of our judgement will be a function 





28 



CHAPTER 2 



NULL DISTRIBUTION 



2.1 The null distribution of the largest latent root. 

From equation (1.3.26) we find that the values of 6 and {d. } 
hinge on the value of the largest latent root, X , of H in the 
metric of H+E; that is, with 

H = r (x . . -X o . ) (x . . -X. . ) * k . 

-1 - -1 - 1 

E = SKx. .-X. .) (x. .-X. .) ' 
ij -ID -1 -ID -1 

then X is the largest solution of 
|H-X(H+E) |=0 o 

If no element of x is correlated with the grouping scheme, that 
is • 

p(x| individual categorized in group j) 

is independent of j then the grouping of the individuals result 
in a random arrangement of the x^^. We hcive then that the 
density of x^^ is independent of j and also of i. It is then 
easy to show that H and E are independent variates of the 
Wishart type when x^^ is assumed to have a multivariate normal 
density; in fact 

E'v'\)t^(V;n-c: (0)) 

H'vY)„(V;c-l:(0)) 








[for iK^tation see Part 4, volume I of this contract] 



The joint distribution of the latent roots of |h-^(E+H) | is 
known f see for example Part 4, volume I, chapter 6 of this 
contract, and good approximations to the percentage points 
of the largest root are available. The case where c=2 can 
be be handled in terms of the percentage points of the F-ratio. 

2.2 Case c=2. 

When c=2 we find that H is of rank 1, consequently there 
exists only one non™- zero solution of equation (1.2.26). Returning 
to section 1.3 we see that 




n 



6— X/l—A, then 6 is the non~sero solution of 



( 2 . 2 . 2 ) ^ 1^"2 




0 



Now since for 6j^0 



u 
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u , 8E 
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we see that 



30 



(2.2.3) 



0= A 







which^ given E and “^ 2 * easy to calculate by performing 



large values of 8 are significant. 

2.3 Case c>2. 

In the case c>2 we are committed to the use of percentage 
points of the largest A satisfying 



the distribution of the largest latent root is discussed in 
the statistical literature by several authors. In particular ^ 
tables of percentage points are given by D.L. Heck (Ann. Math. 
Statist., i960, pp. 625-642). 

As an alternative test of the hypothesis Ro. (1,2, . .p) =0 one 
may also use 



This does not require the evaluation of the largest root 
and is computationally much simplier, however it is to be 
remembered that if -2p^log|E| /|E+ h| proves significant, 
the next step will likely be to establish ^ for which the 
largest latent root will be needed. 



a Doolittle on E carrying (x^.-X 2 .) (Part 4 Volume II of this 
Finally 




(2.2.4) $X 2:^1 

p:n-p-l 



P 



|H-A(H+E) 1=0 



-2p^log!E|/|E+H| 



where (c+p) . 











Case c=2 



.‘t 



O 



3,1 Preliminaries 



For the case c=2, the multiple correlation coefficient, 
R say, between the vectors x.. j=l,2,.*k.; i=l,2 and the 
scores (see equation 1.3.5) 



(3 



. T'^r-'^V^ 

,1.1) J 

■ ( <^2= 



may be obtained from 



( 3 . 1 . 2 ) R^=(j 2 d. (x, .-x..)^c“^^Ed. (X. .-X..)) 

ij 1 -1] - / 'ij 1 -13 - / 



(Ed.Jc.x..)C”^(Ed.)t.x. .) ' 

^ X X X i ^ 1*^1 



where 



=^1^2 (Xj^. X2*^ 2*^ 

n 



(3.1.3) c=j:i:(x. .-X..) {X. .-X..) * 

ij “^3 - -13 - 

Accordingly, we may use some of the work developed in Part 
I of this contract (in particular chaptejr 2 of that part) 
for a suitable underlying model. 



<f3 
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3«2 An underlying model 



Assume that an unknown variable exists for every 

individual. For the n individuals whose vector measurement 
is available (i=l, 2: j=l,2,..k^) there is the variable 
XQ^j which is unobserved but about which the follov/ing infor- 
mation is available. 



(3.2.1) sup{x^, . }<inf {x^^ . } 

OI3 j 023 

that is the variable for an individual is group 1 is less 

than the ^02j2 individual in group 2. We may imagine 

then that the dichotomy of n individuals into the two groups 
is made on the basis of the unobservable (hereafter called 

the characteristic normal variable) . 

It is recognized that the existence of the characteristic 
normal variable represents an idealized situation. In effect 
we are saying that the categorization of an individual is made 
on the basis of a single characteristic variable. Obviously 
one can think of many ways of assigning an individual to one 
of various categories; however the use of the characteristic 
nomal variable seems to use not unreasonable and fairly simple 
to apply. Further the critical regions associated with the 
test of hypothesis H^:R=0 do not depend upon the assumptions 
of the existence of characteristic normal variables although 
of course the investigation of the power of the critical 
region does. 








r . c * — .. 




(3 






o Vii 



:i^ 




wher6 X is the observed vector and x^ the corresponding 

characteristic normal variable. That is to say (x ; x, , x^..x ) 

o 1' 2 p 

have a multivariate normal density with mean vector and 
dispersion matrix as displayed in (3.2.2). 

If it be required to divide n individuals into two cate- 
gories with and ^2 members then we construct two groups 

^^ 011 ' ^ 012 '* ^^021 ^ 022 **^ 02k2 ^ 

which passes the property given in (3.2.1) that is 



sup {Xqj^^ }<inf {Xq 2 > • 



. ^ 

ERIC 




the groups are therefore uniquely constructed with probability 
one. 

It is noted that in another problem k^ and k 2 (k^+k 2 =n) 
maybe random variables; the dichotomy being achieved via 



sup{Xoj^j}<T<inf{Xo2j}). 

3.3 Development 

The development of the distribution of now parallels 
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the development in Part I of this contract. In particular we 



IkAUH Y ^ 



^ ? h ^ '^T> ^ h~]J W , 

(3.3.1) i-— } %T^ 









(v=n-l) 



where 


(3.3.2) 


a®--'” 


C3.3.31 


^11 1 


(3.3.4) 


«ii i {» 


and 


hi^ are 


(3.3.5) 




(3.3.6) 




with 


(3.3.7) 


6=R />! 
o' 



a(a~l) (a~2) « . . (a—r+1) 



and are independently distributed conditional on {^Qi j ^ 

^ v2 



ID 



o 



and the population multiple correlation coefficient between 
Xq and * 

Upon substituting values for d^ and d^ 

(3.3.8) (Hq2'Xo1>^ 

n 



o 

ERIC 



Writing 



(3.3.9) -Xg..)^ 

i j ’* 

(3.3.10) (^02-*01>^ 

n 

we have 




(3.3. U) 

1 ^, 2,„2 2 . 

(X2 ’ 



From the form of equation 3.3.1 evidently 




where r is the "multiple correlat.ion" in the case p=l 
and r=H“l, 

3 . 4 The oase moments of a correlation coefficient. 

In the last section, we saw that the moments of our 
test criterion (general p) were a simple function of 
the moments of a test criterion appropriate to the 
case p=i. Write 






'A~-‘/f‘- <• r ft- 
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The condition joint density of hj^£ and is 



(3.4.10) p(h, , J { ^} ) = 

Ai A A V/A J 



-Js( ^) 
e e 

v+1 

.”T“ 



00 jj+i-l ^ ^ ^ 

XX X ^ 



oo 

Z c 

i=0 j=0 



2 ^H: r (!s+i)2^^j!r (!s+j) 



Transforming from where 

(3.4.11) 

and integrating out the we find 



(3.4.12) p(Zj{x^^j} ) = 



V. . 



~i 



CO 


00 


s 


2 


i==0 


j-0 



Z' 



(1-Z) 



B(|+j;Jg+i) 

over the region Oj^Z^l, 

2 

The moments of r condition on are now a simple matter, 

In fact 



(3.4.13) f (r^|{x. .}) 
P 103 



~^C A^‘fX2) 



E 



(Aj^+X^) 



k 



k!-2’'{S^k) 




!s+i-l i,j 



l” 




(3.4.14) 



(r |{ Xoij})#e 



1/2 



3 * 



I ^ 
^ 0 



ki2^(2ii+k) 



12 }^ 2 ^ ^ 



3 , i! 

0 



kl 2 ’'{ 2 ± 2 + k ) 



C2l 



1 , ,2 “ 



+ ($A,) I 
^ ^ 0 



k ! 2 '"( S « + k ) 



C2l 



(3.4.15) ;^{r6|{^^.,}) e 



V^Ui+A 2> 



15 : 

■^0 



00 



+k) 



£5 . „ (X, +X 5 ) 



+i|(|Aj^)^ Z 



n + 7 .,.. i : 33 "' 2 ''l 



1 ^ ” (X +X 

+ (~xj z ^ Aj _+ a 2 ; 



kl 2 "( iiji ^ k ) 



1 c n 4*9 Cs 3 

k ! 2 ^( S |^ + k ) 



(3.4.16) ^(r®|{Xoij)) 



" 2 (^ 1+ >2 ) 



105 “ (^1+^2^^ ^ 10£ ” (^i+X2)^ 

16 0 ^,2N£±i+)0^"^ " “kl2’'(£±l+k) 






105,1, “ <^1+^2’ 
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° k!2''(S±ifk) 
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+ 14ix,)'‘ Z — — = — 
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^ ■*■ 0 

, . «k,n+13 
k: ^ ^5 — 



03 



k; 



To work toward the non-conditional moments we use 




a=n 



the fact that, for ^ set of independent normal 



variates 



2 2 

(3.4.17) pj[n_ , S^)=p^n_jp(S^) 
S S 



lERLC 



2 2 2 

that is n /S and S are independently distributed. Further 
2 2 

S has a central x -distribution with n-l=v degrees of 
freedom. 



Consequently 



(3 



.4.18)^ 



{S^V^' ' } 



n-l-2k 



2^(l+6^) ^ 



Now, using (3.4.17) and 

(3.4.19) ^(nVg2) ^ =^(n^)^/^(S^)*' 



we have 
(3 



i ;;2_2 

. f, 2h_2k "2 ° ^ , 
.4.20) Mn S e ) 



^2 2(h+k) 

= &((-^ ) S e 2 ) 

s 



= ^(4)** ,s2(h+k)^-V^6V) 



/H-l, 1 X 

r (~2 — 

r(2§i) 



'i&lX . * ,*\ 






r (S^ +k+h) 
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2h, ^k+h n-1 



r( 



+ k+h) 



+h+k) 



(n-1) (n+1) . . . (n-l+2h) r (2^) (1+6^) ^ 



The nonconditional moments of r are then, using (3 



(3.4.21) •? (r^) = 
JO 



T « 2k 
6 






(1+6^) ^ 
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(3.4.22) ^(r’) = 
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(3.4.23) 
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3.5 Moments of n • Case k^=n-l. 



2 2 . 

Clearly moments of r and thence R are available for given 

6^=Rq/1-Rq when moments of >'| ^ are established; this section 

2 

is devoted to establishing moments of 'j for the case of two 
groups when one group contains n-i individuals, the other 
group only one individual. Such a situation arises when only 
the "best” individual is picked out or singled out; the 
remaining (n-l) being lumped together in the category of 
"not best". 

For convenience we replace ^q 3 * * ^^On 

u,<u«<...<u^ so that we can consider u, to be 

1 2 n 1 4 n 

a ranked set of n independently drawn observation from 
the standard normal population. With 



then for 



2 

(3.5.2) n - — 
n-1 



2 

The process of giving moments of t) in terms of moiuents 

of u is routine, however of some use is the fact that 
n 

u -u and u have independent distributions* 
n 

We find after a little algebra: 

(3.5.3) ^(n^)= HO 

.5.4) g(n^)=^^^ 



(3 



(3.5.5) & (n ) 






{3.5.6) 



420 



n 



Since the first ten moments of the extreme (u^^) from 
a sample of n from a standard normal population are 
tabulated by Ruben for n-l(l)50 we give the first four 
moments of 1-r using equations (3.5.3) thru (3.5.6) and 
(3.4.21) thru (3.4.24) for the cases 



k^=n-l; 1^2*^ 

n=4(i)20(5)50 



R^=0.05, 0.10(0.10)0.90,0.95 



these moments which are of interest are given in Appendix A. 



3.6 Moments of general two group situation, 



In the general case of two groups we shall, for convenience 
put 

(3.6.1) k2^==n-k2=k 



It proved more convenient to work with a function. 



(3.6.2) t= — - f - u 

(n+2)^ 



so that 



/o £ 2 (n*f2) t 

(3.6.3) n = 




' V .a ---rr.-V. 

V' VA* V"'V \ i'' -' ■' ^ :>* ■' '-^ 




IerIci 



Write 



45 



/ n £! ^ \ «i 

^<^«VfrnC/ \A 



(k) 



then, since u^j^j-u and u are independently distributed we 
find 



(3.6.5) ^ (t)«- 



nk' 



(n*f2) 



‘“(k)*"n •* 



2 n^k^ 

(3.6.6) / (t^)=iLJi-. 

(n+2) 



n 



^ _h 

Using a method developed by Saw (1958) the moments of 
can be obtained as a power series in l/(n+2). In fact if 

(3.6.7) <j>Cpj^,n,a,b)* = 



a 



u. 



1 


„2 1 




«r 


e 




“r "7u^ 


/ e 


du J 



••00 



when has the distribution of the r largest of n 
standard normal variates and where 



1^1 



(3.6.8) pj^=k/(n+l) 



R 



’^a b 'ised for (pj^,n,a,b) when n and k are fixed 



-.■.^.■■■ 7 ,^^. :i:- -v:*?i'*^% ■' ::s ' -»K' v-- ?- - = ■ -: 
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then it is easily shown (following Saw, 1958) that 



(3 



.6.9) ^it 






— i — / niji(j^2+n{k-i) (l-3i!ij^^j^)+n(k-l) {k-2)i)2^Q-lt‘ 
(n+ 2 ) ' ' 



<3.6.10) ^ (t^)= 



6 ' 



(n+ 2 ) 



+n‘‘(k-l) (k- 2 ) ( 8*2 .+254>2^2‘*'^' 



+n‘^(k-l) (k-2) (k-3) ( 6 ') 2 ,o"^®'<’ 3 ,l> 
+n^(k-l) (k-2) (k-3) (k- 4 )V 4 ^Q 



- 6 nk^(k-l) (k- 2 )i)i 2 n+k^ 



For fixed values of pj^=k/(n+l) we may write 



(3.6.11) *3 b“i=i Hj(Pit>afb ) ^ o(n+2) 



-(i+1) 






(n+ 2 ) 



Values of H^(pj^,a,b) were Originally given by Saw for 
a+bj^4 , i=0 ( 1 ) 5 and Pj^~0 .50(0.05)0.80; however since 
those computation were made on a mechanical hand 
calculator the values were recalculated under this 







m,m rr^^%fAr4-^i^i&:>o- :J^^L.:^. 













47 



project and the range extended to a"J-b_^4; pj^=Q,50 (Q«05) 0,95o 

The recomputed tables (which will be of use elsewhere in 

the theory of order statistics) are given in Appendix 

Using (3,6.9), (3,6,10) the first two moments of t 

can be expressed as a power series in — - for a fixed value 

n+2 

of Pj^ in fact if we may write 



CO 



( 3 . 6 . 12 ) 



(t®)= Z J.!pj^,s) (n-i-2)~3 
j=0 ^ 



and values of J.(Pv«-S) are given in Table I below for 

1 



n ^ 

s-1,^ 



p,^=^0. 50(0. 05)0.95 
^ j=0(i)5 

1 

(the choice not restrictive since one group or 

the other must have at least half the observations contained 
in it) . 



The net result now is that for the case of two groups we 
have four moments of K for general p when k^-n”l~n-k2 
and two moments of R for general p and general (i<kj^<n~l) . 
on the basis of these moments we give in the next chapter 
an indication of the power of our proposed method and find 
that the power is suprisingly good. 
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CHAPTER 4 



THE POWER OP THE TEST OF R^=0. 



4 . 1 Introduction / 

/ 

We consider the. case of two groups only (i.e. c=2) . 

It is to be expected that, all else being equal, as the number 
of groups increases, the power of the test will also increase 
since essentially more-, information is at the statisticians 
disposal. When c=n we reach the case discussed in part I 
of this contract when it was found that the efficiency of 
the test based on a complete set of rankings was very close 



to unity. This was the case when d 



n+lv , /n(n^-l) xVa. 

3 = 1 , 2 ,..., 

3 2 12 



n, 



4.2 Large sample behaviour 

Taking the case p=l ; c=2 and general n we consider the 

2 

conditional moment generating function of -2plog(l-r ) where 

p is some arbitrary constant and r is the correlation between 

the d. and the (scalar) normally distributed observations x. .. 
3 

We write 



(4.2.1) 




(S) . 

-2plog (l~r^) 



n 



/ 2 
' s 



= ^^exp(-2peiog(l-r^)) 2^ 



= 



2x 

r ) 



-9 



p6 




2 

Required then is the conditional (-2^^)-th moment of 1-r 
which is available using the work .of section 3.4. In fact 
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(4a2.2) 4 , ( 9 ) 



■2plog (1-r ) I n ^2 



n~l 

O ij o 









s“) 



^ ^ t 

Xi J 1 



r ) 

n^) 



B{i+|; j4-j-2p0) 
B(i+|; f^-j) 



where v=n- 2 . 

Now, using the expansion for log r (x+h) in terms of the 
Bernoulli polynomials in h (see for example Part 4, of 
volume I of this contract, in particular page 94) we have 

(4.2.3) log ^j-2p9) _ 

B(i+^; ^j) 



. . 1 , 



00 





r 1 


r 

P 


(1-20) 



-4 



with 

(4.2.4) i+j-p)-Bj+l(7<-i-f> I 

r(r+l) ^ 



and B (h) the sth Bernoulli polynomial in h, 

5 



In particular. 

(4.2.5) -|(i+i) (v+2j+j-2p-| 







i w“->’ 



. >■ ' •>- 
S’"9- \ k'i:^y .. ■ 

' '■ >. '' 'ir-'.’^ ' ’ ,^‘C 



W, '' '1 '' 







l| 



M 



m 



M 



I 



I 



m 
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r 



(4.2.6) 



7r2-|(i+|)^ (v-^-+i+2j-2p) +i(v-^i+2j-2p) 

/V+l,.,. \ /V, . \,1 > 

“(“2 — ^i+j~p) I 



If we now set 



(4.2.7) P=^V+ a 

M 



with a of order zero in n then 



(4.2.8) ir^= 



(-) 



r(r+l) 



Vl (|+i+j-a)-Bj,+i(j-a) 



SO that IT is of order zero in n. 
r 



With p defined by (4.2.7) we get 



(4.2.9) B(i+i; |+j-2p0) 



B(i+|, ^j) 






(1-26) 



r4 



1 + 






(4.2.10) (j) (6) 



2. ^ 2 



-2plog(l-r )\r\ j 

i ' 2 

1 c 



-1=- /^)+0(i 
(1-2.6) ' '' ' 



and using this in (4.22) after performing the summations over 
i and j 





+ i (nr^jbj 



+ i bj +0(ij) 

P P 

where b 2 and converge to limits which are constant in 

0 and 6^n^/ 2 as n approaches infinity. For large n therefore 
s 

since p behaves as n, the first term of (4.2.10) describes 

the behaviour of the moment generating function of 

2 2 
~2plog(l-r ) conditional on n /g2. 

Allowing n to go to infinity and 

zero in such a manner that 



m 



w 

B 





Conditional on n^/s^ therefore, -2plog(l-r^) is asymptotically 
distributed as the square of a normal variate with mean 




and unit variance. 



r 



2 2 

n“/ 2} is an increasing function of X 

s 2 

for any , and thus Pr{r >a} is an increasing function 



Thus Pr{r >a 
2 



X. (non-conditionally on . 



4.3 Small - sample behaviour 



To discuss the small sample behaviour, it is of interest 
to consider the regular product - moment correlation coefficient 
between the set of variables {(u. ,v. )} where, 

X X X— X 



(4.3.1) 




,n 



The product moment estimate of p, call this p , is given as 



_ 2 

(4.3.2) p = 2^{u.“U) (v.-v) 

• X X 



2(U.-U) ^ 2(V .“V) 






Now set 



.3.3) 



u. -u 

■JL 






‘/a 



x~l , 2 , . e , n 



so that 



2 








, o 

ERiq 



where 



/% 

(4.3.5) p =Ea. (v.-v) 

(4.3.6) q^=E (v^-v) ^-[^Ea^ (v^-v)J ^ 



n 



(4.3.8) p| {u. } ^ N, {pEa. (u.-u)^; 1-p^^ 

1 1 1 11 



or 



n 



(4.3.11) s = E (u.-u) 

i=l ^ 
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J' 



Now 



(4.3.7) (pu^; 1-p ) 



so that 



m 



(4.3.9) 



where 



n 



2 2 



P^|{i:^^}^(l-p^)Xi(6 s^) 



(4.3.10) «'=pV(1-P^) 



and 






Also 



n 



(4.3.12) q'‘|(u. } (l-p“)x (0) 

^ 1 n-2 

2 2 

and p and q are independently distributed condition on 

{u.}?. (this by the standard theory of least squares). 
11 



!4| 



'‘2 2 2 2 

The distribution of p =p /(p +<3 ) is the same as that lor 

except that for the case of p^, r\^ becomes identically equal to 

n 2 ^2 

Conditional on {u . t therefore r behaves as p except 

^ 1 

2 2 2 _ 
that 6~ has to be replaced by 6 n /_2. it follows from 

s 

the properties of the regular correlation coefficient 

2 

(or rather its square :-p ) that 

2 I 2 2 

(4,3.13) I Pr{r >a n ,7^ increases with r^; n fixed 

/s ° 



2 2 2 
Pr{r >a|n / 2} increases with n? r^ fixed. 
' s o 



It has been easy to discuss this in the context when 

2 

p=l. Obviously, by comparing our R with the estimate 
2 

R ^ of the square of the regular multiple 

o(l,2,..,p) 

correlation coefficient between and (x, , x^,,«.x ) we 

O J. ^ p 

draw the same conclusions for general dimension p, 

% 

Finally then if {d. ?x . is the observed data drawn 

3 - 1=1 

from a population with multiple correlation (that is: the 
multiple correlation between the unobserved x^^ and 
the X ) then if 



R = max corr (d . , 6*x. ) 

3 j 

{{dj};a) 



as defined throughout this work, then 

^2 increases with n fixed 
Pr{R >a} increases with n , R fixed 



4,4 Numerical values of power in a specific case. 



Since we have moments of under the alternative 
hypothesis it is possible to fit a frequency curve to the 








moments of R to obtain an approximation to the power. It 
turns out that a Type I Pearson curve provides a good fit 
as was to be anticipated. The ease n=i5 was considered:- 






ERIC 



Power of the size 0.05 test of H 

n=15 c=2 

* ** 

k^=12(pj^=0.75) 



Ro=0 



k^+k2=15 



k • 



iciic 



ki=8. (pj^=0.50) 



0.00 


0.05 






0.05 




0.05 


0.05 


0.05 






0.05 




0.05 


0.10 


0.05 






0.06 




0.06 


0.20 


0.06 






0.07 




0.08 


0.30 


0.07 






0.10 




0.15 


0.40 


0,10 






0.16 




0.23 


0.50 


0*14 






0.21 




0.33 


0.60 


0.18 






0.28 




0.44 


0.70 


0.21 






0.41 




0.57 


0.80 


0.26 






0.66 




0.81 


0.90 


0.31 






0.83 




0.95 


0.95 


0,37 






0,93 




0.99 


(*four moment 


fit; 


see 


r 

section 3.5) 






**two moment : 


fit; see section 


3.6) 






The power, at 


least 


for 


n=15. 


is not high 


for the 


case of two 


groups when one group has only 


one member 


in it; 


considering 



the almost complete lack of information (on Rq) which this 
situation represents, it is perhaps surprising that is of 
any use at all. For a more reasonable dichotomy of individuals 
and p^=0.50) the power is quite high. It is noted 








that that for given the power in the case Pj^=0.50 is greater 
than the corresponding figure for Pj^=0.75 which in turn is 
higher than the corresponding figure for p^==7/8 (case k^=14). 
This confirms what one would anticipate; that for the 
case of two groups, the most information is to be had when 
the group sizes are equal. 






^ 1.1 
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SUMMARY 



W© have presented a possible approach to the problem of 
— *1 ^ j £4 4 i-k-p =r» i T TT-i rina 1 nti the basxs of 3 . vector of 

V;XdOO J. X. W*. WA* 

scores when the oisual classical model does not hold. It 
is our contention, that within a group homogeneous with 
regard to’ background there does not exist two distinct 
populations:- a "fail” population and a "pass" population; 
rather there is a continuous spectrum: the upper end of 

v^hich are* more likely to pass than those members to be 
found at the lower end of the spectrum. 

Making use of this new model we are able to give a 
method of evaluating the vector of scores (on admission 
tests possibly) in respect of their value as indicators 
of the ultimate ■ worth or performance of the individual. 

The efficiency of the method is surprisingly high. 

It is noted that in addition to the critisms of the . 
use of the classical discriminent approach leveled above; 
the classical approach requires a large calibration sample 
(that is in our notation n must be large) in order that 
the classical theory may hold approximately for in the 
classical theory we must first pick out. those individuals 
which cannot be readily assigned to a group. In our 
model, the individuals who cannot be readily assigned form 
a (middle) group and the data relevant to such an individual 
is likely to be of great value. Diagramitically 




f 




obvious "fail" 
group 



Calibration 
Sample (n) 

j 

I 




neither 

obviously "pass" 
nor 

obviously fail 




obvious "pass" 
group 



Classical model;- makes use of observations, 



Our model;- makes use of kj^+k2+k2 observations 



Experience in the field yields the fact that frequently 
k2 is larger even than kj^+k^. 

Interestingly enough, if it were possible to use 
all n members in the classical theory then the test function 
and discriminant vector is algebraically the same though 
the distribution of these quantities is different under 
the two models. 

/\ 

Given the discriminant vector (£) under our model, 

! 

it may be treated as in the classical case to solve the 
problems « 

(a) bbmpare a new group of individuals with respect to 
their predicted future performance. 

(b) evaluate the choice of admission tests (for example) 
as predictors of future perfomances. 















ERIC 
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APPENDIX A 



Moments of 1-r' 



(see Section 3.4; in particular equations 
(3.4.21) thru (3,4,24), 



Cases :- 



Sample size 
Two groups 



n=4(l)20(5)50 
c=2 • 



Group sizes 



k^=n-l 



k ^- 1 



Vector dimension 



p=l * 



Population correlation R^~0,05, 0.10(0.10) 0.90, 0.95 






(*The moments for general p are obtainable from the results 
for p=l using (3,3.12)). 






i 










W 'A! 






itm 






C 
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R == .05 
o 



N 


li 


!li 


^3 


iii 


4 


.66629885 


.08895878 


-.01690310 


.01693457 


5 


.74958306 


.06260402 


-.01562485 


.01173170 


6 


.79956186 


.04583089 


-.01220669 


.00799750 


7 


.83288820 


.03484069 


-.00928319 


.00551737 


8 


.85669835 


.02732611 


-.00709449 


.00388681 


9 


.87456033 


.02198470 


-.00549568 


.00280049 


10 


.88845637 


.01806065 


-.00432333 


.00206177 


11 


.89957588 


.01509675 


-.00345275 


.00154325 


12 


.90867579 


.01280494 


-.00279640 


.00118361 


13 


.91626079 


c 01099708 


-.00229391 


.00091951 


14 


.92268027 


.00954626 


-.00190356 


.00072473 


15 


.92818386 


• .00836449 


.00159617 


.00057867 


16 


.93295460 


.00738922 


-.00135107 


.00046749 


17 


.93712984 


.00657506 


- .00115340 


.00038171 


18 


.94081456 


.00588841 


-.00099228 


.00031468 


19 


.94409048 


.00530400 


-.00085971 


,00026172 


20 


.94702208 


.00480249 


-.00074966 


.00021942 


25 


.95802067 


.00311244 


-.00040932 


.00010046 


30 


.96523199 


.00218010 


-.00024731 


.00005226 


35 


.97032557 


.00161189 


-.00016065 


.00002981 


40 


.97411523 


.00124017 


-.00011018 . 


.00001822 


45 


.97704506 


.00098375 


-.00007883 


.00001176 


50 


.97937802 


.00079943 


-.00005833 


.00000792 




- S3 



R = .10 
o 



N 


!ii 




lil' 




4 


.66519224 


.08916696 


■ -.01681743 


.01694415 


5 


.74832911 


.06291431 


-.01562261 


.01176969 


6 


.79824454 


.04617874 


-.01225336 


.00805132 


7 


.83155015 


.03519404 


-.00935303 


.00557540 


8 


.85536243 


.02766973 


-.00717279 


.00394269 


9 


.87323911 


.02231176 


-.00557472 


.00285149 


10 


'.88715682 


.01836858 


-.00439926 


.00210703 


11 


.89830171 


.01538508 


-.00352384 


.00158786 


12 


.90742881 


.01307427 


-.00286201 


.00121804 


13 


.91504164 


.01124849 


-.00235400 


' .00094937 


14 


.92143893 


.00978103 


-.00195838' 


.00075061 


15 


.92701985 


.00858392 


-.00164610 


.00060114 


16 


.93181723 


.00759459 


-.00139653 


.00048705 


17 


.93601824 


.00676757 


-.00119481 


.00039878 


18 


.93972782 


.00606915 


-.00103006 


.00032963 


19 


.94302763 


.00547397 


-.00089422 


.00027484 


20 


.94598217 


.00496260 


-.00078124 

t 


.00023099 


25 


.95708204 


.00323404 


-.00043018 


.00010692 


30 


.96437571 


.00227569 


-.00026174 


.00005613 


35 


.96953717 


.00168915 


-.00017105 


.00003227 


40 


.97338367 


.00130403 


-.00011792 


.00001986 


45 


.97636181 


.00103750 


-.00008475 


.00001289 


50 


.97873636 


.00084537 


-.00006297 


.00000873 
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.66071740 
.74326541 
.7929^059 
.82615717 
.84998176 
.86792080 
.88192827 
.89317741 
.90241566 
.91014197 
.91670233 
.92234426 
.92724962 
.93155504 
.93536518 
.93876162 
. 94180885 
.95331741 
.96094278 
.96637732 
.97045230 
.97362450 
.97616605 



".08997692 

.06412813 

.04754000 

.03657602 

.02901256 

.02358888 

.01957007 

.01650935 

.01412382 

.01222767 

.01069492 

.00943775 

.00839338 

.00751604 

.00677163 

.00613440 

.00558455 

.00370587 

.00264631 

.00198853 

.00155137 

.00124564 

.00102320 



tl 

-.01645475 

-.01558670 

-.01241054 

-.00960289 

-.00745794 

-.00586483 

-.00467915 

-.00378651 

-.00310484 

-.00257666 

-.00216165 

-.00183133 

-.00156525 

-.00134856 

-.00117033 

-.00102238 

-.00089854 

-.00050769 

-.00031539 

-.00020968 

-.00014669 

-.00010676 

-.00008021 



.01697529 

.01190896 

.00825149 

.00579218 

.00415186 

.00304250 

.00227664 

.00173633 

.00134712 

.00106129 

.OOOS4765 

.00063541 

.00056039 

.00046279 

.00038568 

.00032409 

.00027440 

.00013117 

.00007068 

.00004152 

.00002603 

-.00001717 

.00001130 







' &5 



R = .30 
o 



■ h. 








.65308021 


. 09124084 


-.01577892 


.01700190 


.73464938 


.06604872 


-.01543113 


.01209680 


.78390998 


c 04:9696^7 


-.01256816 


.00853274 


.81701954 


.038762288 


-.00991561 


,00610067 


.84087892 


• .03113398 


-.00733478 


.00445112 


.85893486 


.02560307 


-.00625729 


.00331649 


.87310345 


.02146193 


-.00506251 


.00252028 


.88453648 


.01827702 


-.00414898 


.00194928 


.89396885 


.01577178 


-.00344155 


.00153278 


.90189209 


.01376330 


-.00288639 


.00122235 


.90864776 


.01212663 


-.00244509 


.00098738 


.91448075 


.01077405 


-.00209004 


.00030676 


.91957142 


.00964246 


-.00180120 


.00066605 


.92405561 


.00868549 ' 


-.00156380 


.00055506 


,92803760 


.00786844 


-.00136684 


.00046650 


.93159889 


.00716487 


-.00120205 


.00039512 


.93480405 ' 


.00655440 


-.00106306 


.00033705 


.94700660 


.00443989 


-.00061658 


.00016624 


.95519311 


.00322193 


-.00039081 


.00009178 


.96108847 


.00245295 


-.00026404 


.00005493 


.96554832 


.00193472 


-.00018718 


.00003502 


.96904697 


.00156801 


-.00013776 


.00002342 


« 97186917 


.00129848 


-.00010449 


.00001623 
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= .40 
o 



N 


iii 


^2 


^3 




4 


. 64188408 


.09280952 


-.01468235 


.01698691 




' .72218168 
« 


.06850159 


-.01501310 


.01227014 


6 


.77091606 


.05245990 


-.01257254 


.00882163 


•7 


• .80388620 


.04156191 


-.01014011 


.00642766 


8 


. .82783712 


.03384293 


-.00816136 


.00477265 


9 


.84608633 


.02816856 


-.00662182 


.00361295 


10 


.86050658 


.02386575 


-.00543127 


.00278500 


11 


.87221984 


.02051797 


-.00450490 


.00218234 


12 


.88194385 


.01785672 


-.00377661 


.00173548 


13 


.88016014 


.01570252 


-.00319745 


. .00139847 


14 


.89720443 


.01393161 


-.00273163 


.001140:2 


15 


.90331829 


.01245623 


-.00235289 


.00093978 


16 


.90868022 


.01121269 


-.00204184 


.00078200 


17 


.81342516 


.01015382 


-.00178398 


.00065641 


18 


.91765707 


.00824401 


-.00156838 


.00055537 


18 


.82145749 


.00845594 


-.00138668 


.00047331 


20 


.82489124 


.00776837 


-.00123242 


.00040635 


25 


.9380934 


.00535533 


-.00072931 


.00020537 


30 


.94708352 


.00393801 


-.00046915 


.00011524 


35 


.95363626 


.00302963 


-.00032062 


.00007000 


40 


.85864367 


.00241008 


-.00022839 


.00004510 


45 


.86260579 


.00196731 


-.00017012 


.00003045 


50 


.86582584 


.00163916 


-.00012986 


.00002134 
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R = .50 
o 



N 






'!i3 




4 


.62694719 


.09442490 


-.01301595 


.01687940 


5 


.70542953 


.07119261 


-.01413762 


.01234705 


6 


.75352781 


.05552012 


-.01221773 


.00902538 


7 


.78641044 


.04466144 


-.01007743 


.00663244 


8 


.81052067 


.03683511 


-.00825350 


.00503366 


9 


.82907557 


.03099326 


-.00679155 


.00385888 


10 


.84386925 


.02650391 


-.00563587 


.00300751 


11 


.85598596 


.02296990 


-.00472085 


.00237956 


12 


.86612229 


.02013155 


-.00399115 


.00190851 


13 


.87474782 


.01781300 


-.00340383 


.00154957 


14 


.88219147 


.01589146 


-.00292668 


.00127211 


15 


.88869117 


.01427893 


-.00253524 


.00105481 


16 


.89442363 


.01291091 


-.00221124 


.00088257 


17 


.89952304 


.01173912 


-.00194079 


.00074457 


18 


.90409336 


. 01072686 


-.00171327 


.00063288 


19 


.90821644 


.00984572 


-.0015204 


.00054166 


20 


.91195771 


.00907347 


-.00135595 


.00046653 


25 


.92649585 


.00633451 


-.00081335 


.00023942 


30 


.93655130 


.00470128 


-.00052829 


.00013617 


35 


.94397103 


.00364276 


-.00036369 


.00008349 


40 


.94969792 


.00291450 


-.00026171 


.00005422 


45 


.95426738 ■ 


.00239039 


-.00019501 


.00003684 


50 


.95800762 


.00199968 


-.00014945 . 


. 00002596 




- 6-8 





0 

tl 

• 

O 


\ 






1^2 


!i3 


^4 


. 60720292 


.09564254 


-.01060029 


.01661229 


.68364340 


.07363308 


-.01257337 


.01223138 


,73110269 


.05837772 


-.01126605 


.00904058 


,76399639 


.04757244 


-.00950388 


.00676714 


,78843174 


.03964300 


-.00790885 


.00514691 


,80746322 


.03363618 


-.00658680 


.00397867 


,82280163 


.02896346 
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.00312315 
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.02524751 
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.00248641 


84619203 
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-.00396284 
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,85537339 
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-.00255257 


.00112247 


87658847 
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-.00223415 


.00094243 


88215335 
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.58149081 


. 09566119 
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. 01608673 - 
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. 01180990 s 


.70264670 
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-.00822473 


.00658402 
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.81022763 
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-.00420921 
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.02365937 
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.84802405 
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-.00234520 
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.85488341 
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. 86104983 


.01414991 


-.00181628 
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.01194691 


-.00143597 
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.90785494 


.00583254 


-.00051349 
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.00454670 


-.00035609 
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.92545222 


.00365513 


-.00025767 


. 000061 ' ^ 


.93170779 


.00300953 


-.00019285 


.00004190 


.93683553 


.00252583 


-.00014832 


.00002970 








= .80 
o 






.54755326 


.09289019 


.61961834 


.07368696 


.66653473 


.05957676 


.70046233 


.04919442 


.72658416 


.04137517 


.74754462 


.03534283 


.76486361 


.03058774 


.77949102 


.02676872 


.79205805 


.02365145 


.30300418 


.02107099 


.81264682 


.01890847 


.82122222 


.01707651 


.82891045 


.01550965 


.83585167 


.01415804 


.84215688 


.01298316 


.84791525 


.01195483 


.85319948 


.01104913 


.87429365 


.00780126 


.88943968 


.00583553 


.90093117 


.00454821 


.90999634 


.00365561 


.91735857 


.00300930 


.92347440 


.00252511 





A 






-.00302091 


.0150*9842 


-.00667500 


.01091016 


-.00637570 


.00803555 


-.00620323 


.00602180 


-.00537659 


.00459141 


-.00460324 


.00355895 


-.00393447 


.00280104 


-.00337221 


.00223540 


-.00290390 


.00180661 


-.00251428 


.00147682 


-.00218927 


.00121978 


-.00191690 


.0010-1698 


-.00168743 


.00085517 


-.00149299 


.00072474 


-.00132730 


.00061861 


-.001135330 


.00053150 


-.00106296 


.00045942 


-.00065095 


.00023928 


-.00042851 


.00013761 


-.00029775 


.00008515 


-.00021572 


.00005569 


-.00016157 


.00003807 


-.00012432 


.00002697 






I 






R = .90 
o 



[L , 









A 



. 5 U 054909 


.08351268 


.00131107 


;.01295930 


.57159125 


.06638048 


-.00295492 


.00906511 


.61977443 


.05362577 


-.00393690 


.00655672 


. 65563070 


.04420761 


-.00389605 
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.68379691 
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-.00354542 


.00367949 
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.00283785 
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-.00272924 


.00222519 
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.02390334 


-.00237387 


.00177064 


.75629644 


.02109398 


-.00206682 


.00142753 


.76864955 


.01877184 


-.00180474 


.00116462 


.77958040 : 


.01682838 


-.00158197 


.00096022 


.78933834 


.01518396 


-.00139258 


.00079931 


.79811536 


.01377902 


-.00123120 


.00067117 
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.01256826 


-.00109322 


.00056805 


.81329870 


.01151674 
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.01059713 
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.00041559 


.82601307 


.00978779 


-.00078417 
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.85043922 


.00689070 
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,00018593 


.86808662 


.00514197 


-.00031899 


.00010651 


. 88153582 


.00399923 


-.00022184 


.00006564 


.89218191 


.00320836 


-.00016071 


.00004279 


.90085202 


.00263665 


-.00012029 


.00002915 


.90807089 


.00220898 


-.00009247 


.00002060 






.46833526 


.07304560 


.54042725 


.05795840 ‘ 


.59052062 


.04675914 


.62828533 


.03851306 


.65817054 


.03231503 


.68261051 


.02754534 


.70308464 


.02379441 


.72055807 


.02078815 


.73569342 


. 0183'3871 


.74896378 


.01631423 


.76071787 


. .01461999 


.77121937 


.01318651 


.78067180 


• .01196184 


.78923533 


.01090651 


.79703792 


, .00999005 


.80418322 


.00918865 


.81075625 


.00848343 


,83714535 


.00596016 


.85623925 


.00443864 


.87080740 


.00344556 


,88234989 


.00275915 


,89175730 


.00226358 


89959528 


.00189334 



.00224984 


.01061373 


-. 00189192 . 


.00729577 


-.00301085 


.00524249 


-.00312577 


. 003 S 7 S 66 


-.00290829 


.00293618 


-.00259868 


.00226599 


-.00223504 


.00177823 


-.00199808 


.00141611 


-.00174610 


.00114252 


-.00152878 


.00093258 


-.00134270 


.00076923 


-.00118369 


.00064051 


-.00104767 


.00053792 


-.00093101 


.00045531 


-.00083063 


.00038815 


-.00074391 


.00033307 


-.00066871 


.00028754 


-.00041224 


.00014831 


-.00027188 


.00008504 


-.00018885 


.00005223 


-.00013662 


-00003399 


-.00010210 


.00002311 


-.00007837 


.00001628 







APPENDIX B 




Values of 



(Pr; a,b) = ^ 

P(Xjr)) 



(r) 



where ^(i) (2) ordered sample 
from the standard normal density with p.d.f. Z(x) and 
c.d.f. P(x). (See section 3.6). 



We write 
(Pr; a,b) = 



i=0 



H. (Pr; a,b) (n+2) 



-1 



B\ 



and tabulate 



H^(Pr; a,b) 



i=0(l)5 

Pr=0.50 (0.05)0.95 
l<a+b<4. 
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